Bilingual PRESRI - Integration of Multiple Research Paper Databases

نویسندگان

  • Hidetsugu Nanba
  • Takeshi Abekawa
  • Manabu Okumura
  • Suguru Saito
چکیده

Collecting all the papers in a research field is a first step towards an exhaustive survey. A number of research paper databases are available for searching papers. However, searchers are compelled to repeat the same search operation for each database if there are multiple databases for a research field. To improve such inefficient searching, we have developed PRESRI, which can construct an exhaustive database by integrating multiple research paper databases. First, we collect Postscript and PDF files on the WWW, and construct a database (‘WEB-DB’) by extracting bibliographic information from the files. Second, we construct an exhaustive database by integrating WEB-DB with other databases. As a key technique for constructing an exhaustive database, we propose a method for extracting bibliographic information from Postscript and PDF files based on a SVM. To investigate the effectiveness of our method, we conducted an examination. We found that our method is useful for both Japanese and English. In this paper, we also focus on the presentation of search results, which is an important factor in constructing an efficient survey environment. We have developed a system that makes it possible to understand the relationships between papers intuitively based on citation information.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bilingual PRESRI

Collecting all the papers in a research field is a first step towards an exhaustive survey. A number of research paper databases, which are provided by libraries, publishing companies and academic societies, are available for searching papers. However, searchers are compelled to repeat the same search operation for each database if there are multiple databases for a research field. To improve s...

متن کامل

A Combination of Models for Bilingual Lexicon Extraction from Comparable Corpora

In this paper we present a method to extract bilingual terminologies from comparable non-aligned corpora, by using multiple linguistic knowledge sources, such as: non-parallel corpora, bilingual thesauri, a preliminary bilingual dictionary, etc... We focus on two core technologies: bilingual lexicon extraction from comparable corpora and expansion through thesauri categories based on different ...

متن کامل

Deep Linguistic Multilingual Translation and Bilingual Dictionaries

This paper describes the MulTra project, aiming at the development of an efficient multilingual translation technology based on an abstract and generic linguistic model as well as on object-oriented software design. In particular, we will address the issue of the rapid growth both of the transfer modules and of the bilingual databases. For the latter, we will show that a significant part of bil...

متن کامل

Identification of miR-24 and miR-137 as novel candidate multiple sclerosis miRNA biomarkers using multi-staged data analysis protocol

Many studies have investigated misregulation of miRNAs relevant to multiple sclerosis (MS) pathogenesis. Abnormal miRNAs can be used both as candidate biomarker for MS diagnosis and understanding the disease miRNA-mRNA regulatory network. In this comprehensive study, misregulated miRNAs related to MS were collected from existing literature, databases and via in silico prediction. A multi-staged...

متن کامل

Squirrel Phase 1 : Generating Data Integration

This paper presents a framework for data integration that is based on using \Squirrel integration mediators" that use materialization to support integrated views over multiple databases. These mediators generalize techniques from active databases to provide incremental propagation of updates to the materialized views. A framework based on \View Decomposition Plans" for optimizing the support of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004